9/27/2018

Kansas State University Statistics Seminar

Introduction

  • Climate change is well understood globally.

  • Climate change is less well understood locally.

  • Need for spatially explicit reconstructions of climate variables.

  • Problem: data sources are messy and noisy.

Local Prediction

Introduction

Predicting the future by learning from the past

Predicting the future by learning from the past

  • Vegetation composition and structure change from ice age to current period.

  • Using change in temperature to predict future vegetation change.


Predicting the future by learning from the past

  • Classify compositional and structural change.

Predicting the future by learning from the past

  • Ordered multi-logistic B-spline regression.


  • Learn vegetation change in structure/composition given temperature change.


  • Forecast future vegetation change.



Learning about the past

Climate proxy data

  • Many ecological and physical processes respond to climate over different time scales.
    • Tree rings, corals, forest landscapes, ice rings, lake levels, etc.


  • These processes are called climate proxies.
    • They are proxy measurements for unobserved climate.
    • Noisy and messy.
    • Respond to a wide variety of non-climatic signals.

Pollen Data

Pollen Data

Pollen Data

Fossil Pollen Data

Model Framework

  • Bayesian hierarchical model.

\(\begin{align*} [\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}] [\mathbf{Z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

Model Framework

  • Bayesian hierarchical model.

\(\begin{align*} \color{cyan}{[\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}] [\mathbf{Z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

  • Posterior.

Model Framework

  • Bayesian hierarchical model.

\(\begin{align*} \color{cyan}{[\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}]} [\mathbf{Z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

  • Posterior.

  • Data Model

Model Framework

  • Bayesian hierarchical model.

\(\begin{align*} \color{cyan}{[\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}]} \color{blue}{[\mathbf{Z} | \boldsymbol{\theta}_P]} [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

  • Posterior.

  • Data Model.

  • Process Model.

Model Framework

  • Bayesian hierarchical model.

\(\begin{align*} \color{cyan}{[\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}]} \color{blue}{[\mathbf{Z} | \boldsymbol{\theta}_P]} \color{orange}{[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P]} \end{align*}\)

  • Posterior.

  • Data Model.

  • Process Model.

  • Prior Model.

Data Model

\(\begin{align*} [\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}]} [\mathbf{Z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

Data model

  • Describes how the data are collected and observed.

  • Researchers take sediment samples from a lake.

  • Take 1cm\(^3\) cubes along the length of the sediment core.

  • In each cube, researcher counts the first \(N\) pollen grains and identifies to species.

  • Raw data are counts of each species.

Data Model

For location \(\mathbf{s}\) and time \(t\),

\(\begin{align*} \mathbf{y} \left( \mathbf{s}_i, t \right) & = \left( y_{1} \left( \mathbf{s}_i, t \right), \ldots, y_{d} \left( \mathbf{s}_i, t \right) \right)' \end{align*}\)

is an observation of a \(d\)-dimensional compositional count.

  • \(y_{j} \left( \mathbf{s}_i, t \right)\) is the count of species \(j\) in the sample at location \(\mathbf{s}_i\) and time \(t\).

  • Compositional count data.
  • Total count is not informative of the absolute composition.
  • Informative of the relative proportions \(p_{j} \left( \mathbf{s}_i, t \right)\) only.

Data Model

  • Compositional count vector \(\mathbf{y} \left( \mathbf{s}_i, t \right)\) a function of latent proportions \(\mathbf{p}\left( \mathbf{s}_i, t \right)\).


\(\begin{align*} \mathbf{y}\left( \mathbf{s}_i, t \right) | \mathbf{p}\left( \mathbf{s}_i, t \right) & \sim \operatorname{Multinomial} \left( N\left( \mathbf{s}_i, t \right), \mathbf{p}\left( \mathbf{s}_i, t \right) \right) \end{align*}\)


  • \(N\left( \mathbf{s}_i, t \right) = \sum_{j=1}^d y_{j}\left( \mathbf{s}_i, t \right)\) is the total count observed (fixed and known) for observation at location \(\mathbf{s}_i\) and time \(t\).

  • Compositional count vector \(\mathbf{y} \left( \mathbf{s}_i, t \right)\) a function of latent proportions \(\mathbf{p}\left( \mathbf{s}_i, t \right)\).


Overdispersion

  • The pollen data are highly variable and overdispersed.


\(\begin{align*} \mathbf{p}\left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}_i, t \right) & \sim \operatorname{Dirichlet} \left( \boldsymbol{\alpha}\left( \mathbf{s}_i, t \right) \right) \end{align*}\)


  • Marginalize out \(\mathbf{p} \left( \mathbf{s}_i, t \right)\) to get Dirichlet-multinomial.


\(\begin{align*} \mathbf{y}\left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}_i, t \right) & \sim \operatorname{Dirichlet-Multinomial} \left( N\left( \mathbf{s}_i, t \right), \boldsymbol{\alpha}\left( \mathbf{s}_i, t \right) \right) \end{align*}\)


Overdispersion

  • We model the Dirichlet-multinomial count data using the log link function:


\(\begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}_i, t \right) \right) & = \mathbf{z}\left( \mathbf{s}_i, t \right) \boldsymbol{\beta}. \end{align*}\)


  • \(\mathbf{z}\left( \mathbf{s}_i, t \right)\)' is a \(q\)-dimensional vector of covariates.


  • \(\boldsymbol{\beta}\) is a \(q \times d\) dimensional matrix of regression coefficients.
    • For identifiability, fix the first intercept term \(\beta_{11} = 0\).

Functional Response

  • Can also estimate how the functional responses covary.
    • Let \(\boldsymbol{\Omega}\) be the functional correlations.
    • \(\boldsymbol{\Omega} = \mathbf{L}' \mathbf{L}\).


\(\begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}_i, t \right) \right) & = \mathbf{z}\left( \mathbf{s}_i, t \right) \boldsymbol{\beta} \mathbf{L}. \end{align*}\)


  • Represents sympatry/allopatry.


  • Can be used to learn about ecological relationships among/between species.

Calibration

  • The \(\mathbf{z} \left( \mathbf{s}_i, t \right)\)s are observed only at \(t\) = 1.


  • Calibration: Estimate \(\boldsymbol{\beta}\) using:
    • \(\left( \mathbf{y} \left( \mathbf{s}_1, 1 \right), \ldots, \mathbf{y} \left( \mathbf{s}_n, 1 \right) \right)'\) and
    • \(\left( \mathbf{z} \left( \mathbf{s}_1, 1 \right), \ldots, \mathbf{z} \left( \mathbf{s}_n, 1 \right) \right)'\).


  • Reconstruction:
    • Use estimated \(\boldsymbol{\beta}\)s and fossil pollen \(\mathbf{y} \left( \mathbf{s}, t \right)\) to predict unobserved \(\mathbf{z}\left( \mathbf{s}, t \right)\).


Calibration Model

Non-linear Calibration Model

  • Vegetation response to climate is non-linear.

\(\begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}_i, t \right) \right) & = \mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right) \boldsymbol{\beta} \end{align*}\)


  • \(\mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right)\) is a basis expansion of the covariates \(\mathbf{z}\left( \mathbf{s}_i, t \right)\).
    • Use B-splines or Gaussian Processes as a basis.
    • \(\mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right)\) is random.
    • Computationally challenging.


  • Note that for \(t \neq 1\), the \(\mathbf{z} \left( \mathbf{s}_i, t \right)\)s are unobserved.


Non-linear Calibration Model

Process Model

\(\begin{align*} [\mathbf{Z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{Z}] \color{blue}{[\mathbf{Z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*}\)

Process Model

Dynamic Model

  • We are interested in estimating the latent process \(\mathbf{z} \left( \mathbf{s}, t \right)\).


  • The model can accommodate:
  1. continuous vs. discrete space (geostatistical vs. CAR models).
  2. continuous vs. discrete time (stochastic process vs. AR models).


  • For now, we focus on continuous space and discrete time.

Dynamic Model

  • For \(\mathbf{z} \left(t \right) = \left( \mathbf{z} \left(\mathbf{s}_1, t \right)', \ldots, \mathbf{z} \left(\mathbf{s}_n, t \right)' \right)\), we assume:

\(\begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & = \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \boldsymbol{\eta} \left(t \right) \end{align*}\)


  • \(\mathbf{M}(t) = \rho \mathbf{I}_n\) is a propagator matrix.

  • \(\mathbf{X} \left(t \right) \boldsymbol{\gamma}\) are the fixed effects from covariates like latitude, elevation, etc.

  • \(\boldsymbol{\eta} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}\left( \boldsymbol{\theta} \right) \right)\).

  • \(\mathbf{R} \left( \boldsymbol{\theta} \right)\) is a Mátern spatial covariance matrix with parameters \(\boldsymbol{\theta}\).

Elevation covariates

Scaling the process for big data

  • Define a set of spatial knot locations \(\mathbf{s}^{\star} = \left\{ \mathbf{s}_1^{\star}, \ldots, \mathbf{s}_m^{\star} \right\}\).

  • \(\boldsymbol{\eta}^{\star} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right) \right)\).

  • \(\mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)\) is the spatial covariance defined at the knot locations \(\mathbf{s}^{\star}\).

  • The linear interpolator from observed location \(\mathbf{s}_i\) to knot location \(\mathbf{s}_j^{\star}\) is \(\mathbf{r} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1}\) where \(\mathbf{r} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right) = \operatorname{Cov} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right)\)

Banerjee S, Gelfand AE, Finley AO and Sang H (2008). "Gaussian, predictive process models for large spatial data sets." Journal, of the Royal Statistical Society: Series B (Statistical, Methodology), 70(4), pp. 825-848.

Predictive Process

  • \(\boldsymbol{\eta} \left( t \right) \approx \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \tilde{\boldsymbol{\eta}} \left( t \right)\).

  • The predictive process can be shown to be the optimal predictor of the parent process \(\boldsymbol{\eta} \left( t \right)\) of dimension \(m\)

  • The dynamic climate process becomes

\(\begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & \approx \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \boldsymbol{\tilde{\eta}} \left(t \right) \end{align*}\)

Time Uncertainty

  • Each fossil pollen observation includes estimates of time uncertainty.
    • The time of the observation is uncertain.
    • Weight the likelihoods according to age-depth model.
    • Posterior distribution of ages.


  • For each observation fossil pollen observation an age-depth model gives a posterior distribution over dates.
    • Define \(\omega \left(\mathbf{s}_i, t \right)\) as P(age \(\in (t-1, t)\)).
    • \([\mathbf{y} \left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha} \left( \mathbf{s}_i, t \right) ] = \prod_{t=1}^T [\mathbf{y} \left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha} \left( \mathbf{s}, t \right)]^{\omega_\left(\mathbf{s}_i, t \right)}\).

Implementation

gitHub package

devtools::install_github("jtipton25/BayesComposition")


  • Includes options for multiple models including:
    • mixture models.
    • different likelihoods and link functions.
    • correlations in functional response.


  • Code in C++ using Rcpp package for computation speed.

Computational Details

  • The model is fit in stages:
  1. Estimate the fixed effect of elevation.

  2. Estimate the spatial residual model.
    • Addresses multicollinearity.
    • Type III sums of squares.
  3. Fit Calibration model.

  4. Spatial Reconstruction.

Hefley TJ, Broms KM, Brost BM, Buderman FE, Kay SL, Scharf HR,, Tipton JR, Williams PJ and Hooten MB (2017). "The basis function, approach for modeling autocorrelation in ecological data.", Ecology, 98(3), pp. 632-646.

Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • High-dimensional spatio-temporal process.
    • Inefficient to sample with block Metropolis.
    • Poor mixing of MCMC chains.


  • Non-linear transformation in the data model.
    • Difficult to use Kalman Filtering.


  • Particle Filtering Methods.
    • Difficult to implement, suffer from degeneracy.
    • Posterior can be multi-modal.

Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • Elliptical Slice Sampling.
    • Assumes a Gaussian prior.
    • Difficult to implement in a DLM framework.
    • Requires no tuning.
    • Efficiently samples in high dimensions.
    • Easily explores multiple modes.


  • Fit using adaptive block Metropolis within Gibbs and Elliptical Slice Sampling algorithms.


  • Highly multi-modal posterior is efficiently explored within the sampler.


Murray I, Adams RP and MacKay DJC (2010). "Elliptical slice, sampling." In AISTATS, volume 13, pp. 541-548.

Estimation of \(\mathbf{z} \left(\mathbf{s}, t \right)\)

  • For Gaussian process expansion of \(\mathbf{z} \left( \mathbf{s}, t \right)\):
    • The latent climate states are inputs into the covariance function.
    • Covariance input locations (and distance) are random.
    • Unique computational challenge.
      • Total computational cost is prohibitive \(O(d \frac{(Tn)^3}{3})\).
  • Proposed solution:
    • Predictive process representation.
    • Reduced computation cost \(O(d T \frac{m^3}{3})\)

Tipton JR, Hooten MB, Nolan C, Booth RK and McLachlan J (2018)., "Multivariate Gaussian processes for inverse prediction using, compositional data." In Review.

Simulation Study

Simuated data

Simulated Reconstruction

Simulated Reconstruction Temporal Trend

Reconstruction

Reconstruction over time

Reconstruction Inference

  • Current methods are site-level "transfer function" methods.
    • These methods ignore elevation, temporal autocorrelation, and spatial autocorrelation.
    • Sensitive to the data.
    • Poor quantification of uncertainty.
    • Unclear how to choose among models.
  • The spatial method is statistically principled.
    • Has higher power.
    • Smaller uncertainties that change with data (sample size, signal coherence, etc.).
    • Can use model selection methods (information criterion, etc).

Competing Methods

Weighted averaging

  • Calibration:
    • Find the "optimum" climate state for species \(j\).

\(\begin{align*} \hat{\mu}_j = \frac{\sum_{i=1}^n y_{ij} z_i} {\sum_{i=1}^n y_{ij}} \end{align*}\)

  • Prediction:
    • Find the weighted optimum over all species

\(\begin{align*} \hat{z} = \frac{\sum_{j=1}^d y_{j} \hat{\mu}_j} {\sum_{i=1}^n y_{ij}} \end{align*}\)

Competing Methods : Weighted averaging

  • \(\hat{z}\) is a mean of means:
    • Strong shrinkage to the mean.
    • Bootstrap uncertainty estimates

Competing Methods : Weighted averaging

  • "Deshrinking"
    • Fit a regression model between \(z\) and \(\hat{z}\) in calibration data.
    • "Boost" the estimate using the regression model in the past.

Competing Methods : Modern Analog

  • k-nearest neighbors.


  • Neighborhood distance is based on similarity of composition vector.


  • Highly dependent on good neighbors (analogues).
    • Spatial model can "extrapolate" based on previously unseen compositions

Reconstruction Inference

Reconstruction Inference

Conclusion

Conclusion

Model framework opens the door to answering meaningful questions:

  • Do pollen distributions change with elevation?
    • Covariate-sensitive parameterizations.
  • Do pollen distributions change over space or time?
    • Regression coefficients vary over space/time.
  • How to combine multiple proxies (tree rings, pollen, etc)?
    • Each proxy gets its own data model.
    • Proxies link to dynamic space-time process.

Thanks for the attention

Trend Comparison